A document consists of structured text: sentances, paragaphs, and meaningful phrases. Bag of words techniques ignore this structure reducing the document to the set of words with a frequency count for each. This is used for similarity metrics such as the Jaccard similarity and cosine similarity.
Used on page 213